pacman::p_load(ggiraph, plotly,
patchwork, DT, tidyverse)Hands-on_Ex03
1. Overview of Hands-on Exercise 3
- I will be learning how to create interactive data visualization by using functions provided by
ggiraphandplotlyrpackages. - I will also be learning how to create animated data visualization by using
gganimateandplotly rpackages. In addition, I will be able to reshape data usingtidyrpackage, and (ii) process, wrangle and transform data by usingdplyrpackage.
1.1 Getting Started: Interactive Data Visualization
1.1.1. Install & Launch R packages
Install and Launch the following R packages:
ggiraph for making ‘ggplot’ graphics interactive.
plotly, R library for plotting interactive statistical graphs.
DT provides an R interface to the JavaScript library DataTables that create interactive table on html page.
tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
patchwork for combining multiple ggplot2 graphs into one figure.
1.1.2. Import Data
The code chunk below read_csv() of readr package will import the Exam_data.csv and save as exam_data as a tibble data frame.
exam_data <- read_csv("data/Exam_data.csv")1.2 Getting Started: Animated Data Visualization
1.2.1. Install and launch R packages
Install and Launch the following R packages:
plotly, R library for plotting interactive statistical graphs.
gganimate, an ggplot extension for creating animated statistical graphs.
gifski converts video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.
gapminder: An excerpt of the data available at Gapminder.org. We just want to use its country_colors scheme.
tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
pacman::p_load(readxl, gifski, gapminder,
plotly, gganimate, tidyverse)1.2.2 Import Data
Import data worksheet from GlobalPopulation Excel workbook.
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls", sheet="Data")%>%
mutate_each_(funs(factor(.)), col) %>%
mutate(Year = as.integer(Year))2.1 Interactive Data Visualization
2.1.1 ggiraph Methods
Tooltip: a column of datasets that contain tooltips to be displayed when the mouse is over elements
Data_id: a column of datasets that contain an id to be associated with elements.
Onclick: a column of datasets that contain JavaScript function to be executed when elements are clicked.
2.1.1.1 Tooltip effect
There are two steps that are needed (Step 1 and 2),
- Interactive version of ggplot2 is used to create the basic graph
girafe()is then utilized to generate an svg object to be displayed on an html page- Customize tooltip style
- Display statistics on tooltip
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
p#
By using the tooltip effect, there is interactivity by hovering the mouse pointer on an data point of interest where the information listed such as the student’s ID will be displayed.
We are able to display multiple information such as Name, Class, Race and Gender on tooltip as shown in the code chunk below.
exam_data$tooltip <- c(paste0("Name =", exam_data$ID, "\n Class =", exam_data$CLASS, "\n Race =", exam_data$RACE, "\n Gender =", exam_data$GENDER))
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = exam_data$tooltip),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
)One example uses opts_tooltip() of ggiraph by adding in css declarations such as changing the background and font colours.
tooltip_css <- "background-color:white; #<<
font-style:bold; color:black;" #<<
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL, breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list( #<<
opts_tooltip( #<<
css = tooltip_css)) #<<
) Statistics such as the 90% confident interval of the mean can be computed and displayed as shown in the code chunk below.
tooltip <- function(y, ymax, accuracy = .01) {
mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean maths scores:", mean, "+/-", sem)
}
gg_point <- ggplot(data=exam_data,
aes(x = RACE),
) +
stat_summary(aes(y = MATHS,
tooltip = after_stat(
tooltip(y, ymax))),
fun.data = "mean_se",
geom = GeomInteractiveCol,
fill = "light blue"
) +
stat_summary(aes(y = MATHS),
fun.data = mean_se,
geom = "errorbar", width = 0.2, size = 0.2
)
girafe(ggobj = gg_point,
width_svg = 8,
height_svg = 8*0.618)2.1.1.2 data_id aesthetic
The code chunk below shows the hover effect that data_id can shown as one of the interactive features of ggiraph.
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
) p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(data_id = CLASS), #default value of hover css fill is orange
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
) p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(data_id = CLASS),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list(
opts_hover(css = "fill: #202020;"),
opts_hover_inv(css = "opacity:0.2;")
)
) p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = CLASS,
data_id = CLASS),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list(
opts_hover(css = "fill: #202020;"),
opts_hover_inv(css = "opacity:0.2;")
)
) 2.1.1.3 Onclick
This provides hotlink interactivity on the web when using the onclick argument of ggiraph where there is a web document link with a data object displayed on the top right hand corner of the figure upon mouse click.
exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school",
as.character(exam_data$ID))
## click actions need to be a "str" column containing javascript instructions
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(onclick = onclick),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618) 2.1.1.4 Coordinated Multiple Views with ggiraph
Use interactive functions of ggiraph such as data_id aesthetic to link observations and tooltip aesthetic to hover over a point with a mouse
Combine it with patchwork learned in Hands-on Exercise 2
p1 <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(data_id = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
coord_cartesian(xlim=c(0,100)) +
scale_y_continuous(NULL,
breaks = NULL)
p2 <- ggplot(data=exam_data,
aes(x = ENGLISH)) +
geom_dotplot_interactive(
aes(data_id = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
coord_cartesian(xlim=c(0,100)) +
scale_y_continuous(NULL,
breaks = NULL)
girafe(code = print(p1 + p2),
width_svg = 6,
height_svg = 3,
options = list(
opts_hover(css = "fill: #202020;"),
opts_hover_inv(css = "opacity:0.2;")
)
) 2.1.2 plotly Methods
There are two ways to use plotly:
- using
plot_ly() - using
ggploty()
plot_ly(data = exam_data,
x = ~MATHS,
y = ~GENDER,
colour = ~RACE)From using the functions subplot() and highlight_key(), I am able to compare results of students’ scores for Math, Science and English. I am also able to pinpoint any student by click on a data point of any one of the scatterplots to see the students’ scores.
hightlight_key() is used to share data and creates an object of class crosstalksubplot()helps to place plots side by side
d <- highlight_key(exam_data)
p1 <- ggplot(data=d,
aes(x = ENGLISH,
y = SCIENCE)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
p2 <- ggplot(data=d,
aes(x = ENGLISH,
y = MATHS)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
subplot(ggplotly(p1),
ggplotly(p2))2.1.3 crosstalk Methods
Crosstalk is an add-on to the htmlwidgets package. It extends htmlwidgets with a set of classes, functions and conventions for implementing cross-widgets interactions (currently, linked brushing and filtering). ::: panel-tabset ### 2.1.3.1 Interactive Data Table: DT package
A wrapper of JavaScript Library DataTables
Data objects in R can be rendered as HTML tables using JavaScript library “DataTables” via R Markdown or Shiny.
DT::datatable(exam_data, class = "compact")2.1.3.2 Linked brushing
Code chunk below is used to implement the coordinated brushing.
highlight() sets a variety of options for brushing (i.e. highlight) multiple plots. It is primarily designed to link multiple plotly graphs together and may not behaved as expected when linking plotly to another htmlwidget package via crosstalk. Some cases such as persistent selection in leaflet, other htmlwidgets will respect the options.
bscols() is a helper function of crosstalk by putting HTML elements next to each other. It can be called directly from the console but is designed specifically for R Markdown.
d <- highlight_key(exam_data)
p <- ggplot(d,
aes(ENGLISH, MATHS)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
gg <- highlight(ggplotly(p),
"plotly_selected")
crosstalk::bscols(gg, DT::datatable(d), widths = 5):::
3. Animated Data Visualization
3.1 Terminology
From this visualization type, we need to understand some key concepts and terminology used in this type of visualization.
Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data.
Animation Attributes: The animation attributes are the settings that control how the animation behaves. For example, you can specify the duration of each frame, the easing function used to transition between frames, and whether to start the animation from the current frame or from the beginning.
3.2 gganimate Methods
gganimate brings your static ggplot2 plots to life, turning them into animations. Some key components to note are explained simply below:
Imagine you’re animating a bouncing ball with ggplot2:
transition_time()decides when and where the ball moves (frame by frame).view_follow()makes the camera follow the ball.shadow_mark()shows the ball’s trail as it bounces.enter_bounce()makes the ball bounce into view.ease_aes()makes the motion look smooth and natural — not robotic.
3.2.1 Static bubble plots
transition_time()of gganimate is used to create transition through distinct states in time (i.e. Year).ease_aes()is used to control easing of aesthetics. The default islinear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.
ggplot(globalPop, aes(x = Old, y = Young,
size = Population,
colour = Country)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(title = 'Year: {frame_time}',
x = '% Aged',
y = '% Young') 
ggplot(globalPop, aes(x = Old, y = Young,
size = Population,
colour = Country)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(title = 'Year: {frame_time}',
x = '% Aged',
y = '% Young') +
transition_time(Year) +
ease_aes('linear') 
3.2.2 plotly Methods
In Plotly R package, both ggplotly() and plot_ly() support key frame animations through the frame argument/aesthetic. They also support an ids argument/aesthetic to ensure smooth transitions between objects with the same id (which helps facilitate object constancy).
gg <- ggplot(globalPop,
aes(x = Old,
y = Young,
size = Population,
colour = Country)) +
geom_point(alpha = 0.7) + ## aes(frame & size) not working w ggplot anymore
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(x = '% Aged',
y = '% Young')
ggplotly(gg)gg <- ggplot(globalPop,
aes(x = Old,
y = Young,
size = Population,
colour = Country)) +
geom_point(alpha = 0.7) + # aes(size = Population, frame = Year) not working
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(x = '% Aged',
y = '% Young') +
theme(legend.position='none') #removes legend
ggplotly(gg)bp <- globalPop %>%
plot_ly(x = ~Old,
y = ~Young,
size = ~Population,
color = ~Continent,
sizes = c(2, 100),
frame = ~Year,
text = ~Country,
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
) %>%
layout(showlegend = FALSE)
bp4. Additional Plot: Interactive map of Singapore with ggiraph
Based on the example shown in Hands-on Exercise 3, I created a interactive map of Singapore as well as the geospatial data of Singaporean residents by planning area in 2024 using data from Data Gov SG and Singstat.
Below shows the code chunk I used based on the data sets found.
Reading layer `MP14_PLNG_AREA_WEB_PL' from data source
`C:\Users\user1\Downloads\MasterPlan2014PlanningAreaBoundaryWebSHP\MP14_PLNG_AREA_WEB_PL.shp'
using driver `ESRI Shapefile'
Simple feature collection with 55 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
[1] "2024"
# A tibble: 6 × 5
`Planning Area` Subzone Age Sex `2024`
<chr> <chr> <chr> <chr> <chr>
1 Total Total Total Total 4180870
2 Total Total Total Males 2034960
3 Total Total Total Females 2145900
4 Total Total 0 Total 29250
5 Total Total 0 Males 14970
6 Total Total 0 Females 14280
[1] "Planning Area" "Subzone" "Age" "Sex"
[5] "2024"
[1] "OBJECTID" "PLN_AREA_N" "PLN_AREA_C" "CA_IND" "REGION_N"
[6] "REGION_C" "INC_CRC" "FMEL_UPD_D" "X_ADDR" "Y_ADDR"
[11] "SHAPE_Leng" "SHAPE_Area" "geometry"
# A tibble: 6 × 2
planning_area resident_population
<chr> <dbl>
1 Ang Mo Kio 1276360
2 Bedok 2216060
3 Bishan 703930
4 Boon Lay 40
5 Bukit Batok 1343100
6 Bukit Merah 1187950
ggplot(singapore_map) +
geom_sf(aes(fill = resident_population), color = "white") +
scale_fill_viridis_c(option = "plasma") +
theme_minimal() +
labs(title = "Singapore Population by Planning Area (2024)", fill = "Population")
I added in tooltip for the hovering effect in combination with ggiraph so that you can see population size by area.
#Area data
singapore_map$area_km2 <- as.numeric(singapore_map$SHAPE_Area) / 1e6
singapore_map$area_cat <- ntile(singapore_map$area_km2, 5)
singapore_map$area_cat <- factor(singapore_map$area_cat)
#Tooltip
singapore_map$tooltip <- paste0(
singapore_map$PLN_AREA_N, " — ",
round(singapore_map$area_km2, 2), " km²"
)
#ggiraph
gg <- ggplot(data = singapore_map) +
geom_sf_interactive(
aes(fill = area_cat, tooltip = tooltip),
color = "white"
) +
scale_fill_brewer(palette = "Blues") +
theme_minimal() +
labs(fill = "Area Size (quintiles)",title = "Singapore Population by Planning Area (2024)")
girafe(ggobj = gg)Data sets taken from Gov.tech.sg (from previous plot) and housingmap.sg. To determine the number of HDB flats in Singapore, I tried using a bar plot graph and animate it to show the different types of flats (e.g. 1-room, 2-room, 3-room, 4-room and 5-room)
library(tidyr)
library(dplyr)
library(gganimate)
library(ggplot2)
hdb_data <- read_excel("C:/shermainn/ISSS608new/Hands-on_Ex/Hands-on_Ex03/data/hestod2022e1.xlsx")
# Rename only the important columns
colnames(hdb_data) <- c("planning_area", "total_hdb", "one_two_room",
"three_room", "four_room", "five_executive_room")
hdb_data_clean <- hdb_data %>%
select(planning_area, total_hdb, one_two_room, three_room, four_room, five_executive_room) %>%
filter(!is.na(planning_area)) # Remove rows with missing planning_area
hdb_tidy <- hdb_data_clean %>%
pivot_longer(
cols = starts_with("one_two_room"):starts_with("five_executive_room"), # Long format for the flat types
names_to = "flat_type",
values_to = "hdb_flats"
) %>%
mutate(
hdb_flats = as.numeric(hdb_flats), # Ensure that the flat values are numeric
planning_area = toupper(trimws(planning_area)) # Clean the planning area names (if necessary)
) %>%
filter(!is.na(hdb_flats)) # Remove rows with missing flat counts
ggplot(hdb_tidy) +
geom_bar(aes(x = planning_area, y = hdb_flats, fill = flat_type), stat = "identity", position = "dodge") +
coord_flip() +
theme_minimal() +
labs(title = "Number of HDB Flats by Planning Area", x = "Planning Area", y = "Number of Flats", fill = "Flat Type")
animated_plot <- ggplot(hdb_tidy) +
geom_bar(aes(x = planning_area, y = hdb_flats, fill = flat_type), stat = "identity", position = "dodge") +
coord_flip() +
theme_minimal() +
labs(title = "Number of HDB Flats by Planning Area", x = "Planning Area", y = "Number of Flats", fill = "Flat Type") +
transition_states(flat_type, transition_length = 2, state_length = 1) +
ease_aes('linear') +
shadow_mark(alpha = 0.3)
animate(animated_plot, nframes = 100)
anim_save("hdb_animation.gif", animated_plot)